1 research outputs found

    Resolving Biological Trajectories in Single-cell Data using Feature Selection and Multi-modal Integration

    Get PDF
    Single-cell technologies can readily measure the expression of thousands of molecular features from individual cells undergoing dynamic biological processes, such as cellular differentiation, immune response, and disease progression. While computational trajectory inference methods and RNA velocity approaches have been developed to study how subtle changes in gene or protein expression impact cell fate decision-making, identifying characteristic features that drive continuous biological processes remains difficult to detect due to the inherent biological or technical challenges associated with single-cell data. Here, we developed two data representation-based approaches for improving inference of cellular dynamics. First, we present DELVE, an unsupervised feature selection method for identifying a representative subset of dynamically-expressed molecular features that resolve cellular trajectories in noisy data. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effect of unwanted sources of variation confounding inference and models cell states from dynamic feature modules that constitute core regulatory complexes. Using simulations, single-cell RNA sequencing data, and iterative immunofluorescence imaging data in the context of cell cycle and cellular differentiation, we demonstrate that DELVE selects genes or proteins that more accurately characterize cell populations and improve the recovery of cell type transitions. Next, we present the first task-oriented benchmarking study that investigates integration of temporal gene expression modalities for dynamic cell state prediction. We benchmark ten multi-modal integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. This study illustrates how temporal gene expression modalities can be optimally combined to improve inference of cellular trajectories and more accurately predict sample-associated perturbation and disease phenotypes. Lastly, we illustrate an application of these approaches and perform an integrative analysis of gene expression and RNA velocity data to study the crosstalk between signaling pathways that govern the mesendoderm fate decision during directed definitive endoderm differentiation. Results of this study suggest that lineage-specific, temporally expressed genes within the primitive streak may serve as a potential target for increasing definitive endoderm efficiency. Collectively, this work uses scalable data-driven approaches to effectively manage the inherent biological or technical challenges associated with single-cell data in order to improve inference of cellular dynamics.Doctor of Philosoph
    corecore